Coding approaches to fault tolerance in dynamic systems

نویسنده

  • Christoforos N. Hadjicostis
چکیده

A fault-tolerant system tolerates internal failures while preserving desirable overall behavior. Fault tolerance is necessary in life-critical or inaccessible applications, and also enables the design of reliable systems out of uilreliable, less expensive components. This thesis discusses fault tolerance in dynamic systems, such as finite-state controllers or computer simulations, whose internal state influences their future behavior. Modular redundancy (system replication) and other traditional techniques for fault tolerance are expensive, and rely heavily particularly in the case of dynamic systems operating over extended time horizons on the assumption that the error-correcting mechanism (e.g., voting) is faultfree. The thesis develops a systematic methodology for adding structured redundancy to a dynamic system and introducing associated fault tolerance. Our approach exposes a wide range of possibilities between no redundancy and full replication. Assuming that the errorcorrecting mechanism is fault-free, we parameterize the different possibilities in various settings, including algebraic machines, linear dynamic systems and Petri nets. By adopting specific error models and, in some cases, by making explicit connections with hardware implementations, we demonstrate how the redundant systems can be designed to allow detection/correction of a fixed number of failures. We do not explicitly address optimization criteria that could be used in choosing among different redundant implementations, but our examples illustrate how such criteria can be investigated in future work. The last part of the thesis relaxes the traditional assumption that error-correction be fault-free. We use unreliable system replicas and unreliable voters to construct redundant dynamic systenms that evolve in time with low probability of failure. Our approach generalizes modular redundancy by using distributed voting schemes. Combining these techniques with low-complexity error-correcting coding, we are able to efficiently protect identical unreliable linear finite-state machines that operate in parallel on distinct input sequences. The approach requires only a constant amount of redundant hardware per machine to achieve a probability of failure that remains below any pre-specified bound over any given finite time interval. Thesis Supervisor: George C. Verghese Title: Professor of Electrical Engineering

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coding Approaches to Fault Tolerance in Combinational and Dynamic Systems

Why should wait for some days to get or receive the coding approaches to fault tolerance in combinational and dynamic systems book that you order? Why should you take it if you can get the faster one? You can find the same book that you order right here. This is it the book that you can receive directly after purchasing. This coding approaches to fault tolerance in combinational and dynamic sys...

متن کامل

Fault tolerant Dynamic Scheduling of Object Based Tasks in Multiprocessor Real time Systems

Multiprocessor systems are fast emerging as a powerful computing tool for real time applications The reliability required of real time systems leads to the need for fault tolerance in such systems One way of achieving fault tolerance is by Primary Backup PB approach in which two copies of a task are run on two di erent processors In this paper we compare and contrast three basic PB approaches i...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Fault - tolerant Dynamic Scheduling ofObject - Based

Multiprocessor systems are fast emerging as a powerful computing tool for real-time applications. The reliability required of real-time systems leads to the need for fault-tolerance in such systems. One way of achieving fault-tolerance is by Primary-Backup (PB) approach in which two copies of a task are run on two diierent processors. In this paper, we compare and contrast three basic PB approa...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999